Listing of the Claims: 



1 . (currently amended) A method implemented in a computer system, for clustering a 
string, the string including a plurality of characters, the method including: 

identifying R unique n-grams Ti. .ain the string; 
for every unique n-gram Ts: 

if the frequency of Ts in a set of n-gram statistics is not greater than a first 
threshold: 

associating the string with a cluster associated with Ts; 
otherwise: 

for every other n-gram Ty in the string Ti„.r, except s: 

if the frequency of n-gram Tv is greater than the first threshold: 

if the frequency of n-gram pair T s -T v is not greater than a second 
threshold: 

associating the string with a cluster associated with the n-gram pair 
Ts-Tv; 

otherwise: 

for every other n-gram T x in the string Ti...r, except s and v: 

associating the string with a cluster associated with the n- 
gram triple T s -Tv-Tx ; 

otherwise: 
do nothing. 

2. (original) The method of claim 1 further including compiling n-gram statistics. 

3. (original) The method of claim 1 further including compiling n-gram pair statistics. 
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4. (currently amended) A method - implemented in a computer system, for clustering a 
plurality of strings, each string including a plurality of characters, the method including: 

identifying unique n-grams in each string; 

associating each string with zero or more clusters associated with low frequency n- 

grams from that string , if any ; and 
associating each string with zero or more clusters associated with low-frequency pairs 

of high frequency n-grams from that strin g, if any . 

5. (original) The method of claim 4 further including: 

where a string does not include any low-frequency pairs of high frequency n-grams, 
associating that string with clusters associated with triples of n-grams including 
the pair. 
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6. (currently amended) A method- implemented in a computer system, for clustering a 
string, the string including a plurality of characters, the method including: 

identifying R unique n-grams Tj .rui the string; 
for every unique n-gram Ts: 

if the frequency of Ts in a set of n-gram statistics is not greater than a first 
threshold: 

associating the string with a cluster associated with T s ; 
otherwise: 

fori = ltoY: 

for every unique set of i n-grams T\j in the string Ti , Rj except s: 

if the frequency of the n-gram set T s -Tu is not greater than a second 
threshold: 

associating the string with a cluster associated with the n-gram 
set Ts-Tu; 

if the string has not been associated with a cluster with this value of T s : 
for every unique set of Y+l n-grams Tuy in the string Ti. .r, excepts' 

associating the string with a cluster associated with the Y+2 n- 
gram group T s -Tuy. 

7. (original) The method of claim 6 where Y = 1. 

8. (original) The method of claim 6 further including compiling n-gram statistics. 

9. (original) The method of claim 6 further including compiling n-gram group statistics. 
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10. (original) A computer program, stored on a tangible storage medium, for use in 
clustering a string, the program including executable instructions that cause a computer to: 

identify R unique n-grams Tj ... R in the string; 
for every unique n-gram Ts: 

if the frequency of Ts in a set of n-gram statistics is not greater than a first 
threshold: 

associate the string with a cluster associated with T s ; 
otherwise: 

for every other n-gram Ty in the string Ti...r, except s' 
if the frequency of n-gram TV is greater than the first threshold: 

if the frequency of n-gram pair Ts-T v is not greater than a second 
threshold: 

associate the string with a cluster associated with the n-gram pair 
Ts-Tv; 

otherwise 

for every other n-gram T x in the string Ti ...r, excepts and v: 

associate the string with a cluster associated with the n- 
gram triple T s -T v -T x ; 

otherwise: 

do nothing. 

11. (original) The computer program of claim 10 further including executable instructions 
that cause a computer to compile n-gram statistics. 

12. (original) The computer program of claim 10 further including executable instructions 
that cause a computer to compile n-gram pair statistics. 
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